effective diversity
Effective Diversity in Population Based Reinforcement Learning
Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment. With that in mind, maintaining a population of agents is an attractive method, as it allows data be collected with a diverse set of behaviors. This behavioral diversity is often boosted via multi-objective loss functions. However, those approaches typically leverage mean field updates based on pairwise distances, which makes them susceptible to cycling behaviors and increased redundancy. In addition, explicitly boosting diversity often has a detrimental impact on optimizing already fruitful behaviors for rewards. As such, the reward-diversity trade off typically relies on heuristics. Finally, such methods require behavioral representations, often handcrafted and domain specific. In this paper, we introduce an approach to optimize all members of a population simultaneously. Rather than using pairwise distance, we measure the volume of the entire population in a behavioral manifold, defined by task-agnostic behavioral embeddings.
Effective Diversity in Population Based Reinforcement Learning
Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment. With that in mind, maintaining a population of agents is an attractive method, as it allows data be collected with a diverse set of behaviors. This behavioral diversity is often boosted via multi-objective loss functions. However, those approaches typically leverage mean field updates based on pairwise distances, which makes them susceptible to cycling behaviors and increased redundancy. In addition, explicitly boosting diversity often has a detrimental impact on optimizing already fruitful behaviors for rewards.
Review for NeurIPS paper: Effective Diversity in Population Based Reinforcement Learning
Weaknesses: The paper may need to be improved to address a few important issues, as detailed below. Why is it important to enhance population-wide behavioral diversity? Intuitively I can understand the potential benefits related to deep exploration and learning stability. However, theoretically I cannot link the benefits straightforwardly to the proposed use of kernel function and the kernel matrix determinant. Theorem 3.3 states that when lambda is set properly, the population will contain M distinct optimal policies.
Review for NeurIPS paper: Effective Diversity in Population Based Reinforcement Learning
This paper focuses on an interesting problem of maintaining diversity in a set of agents. The paper formalizes the problem clearly, and the initial results presented are positive and support the paper's claims. The paper is fairly well written. Among the aspects of the paper that could be improved upon, the experimental section lacks many details and no ablation studies are performed, it is not clear how the new technique compares from a computational cost perspective, and some of the implications of the assumptions made (e.g., on the scope of problems to which this approach can reasonably be applied) are not clearly stated. Overall, the paper was found to make a sufficient contribution and the final recommendation is to accept.
Effective Diversity in Population Based Reinforcement Learning
Exploration is a key problem in reinforcement learning, since agents can only learn from data they acquire in the environment. With that in mind, maintaining a population of agents is an attractive method, as it allows data be collected with a diverse set of behaviors. This behavioral diversity is often boosted via multi-objective loss functions. However, those approaches typically leverage mean field updates based on pairwise distances, which makes them susceptible to cycling behaviors and increased redundancy. In addition, explicitly boosting diversity often has a detrimental impact on optimizing already fruitful behaviors for rewards.
Pick Your Battles: Interaction Graphs as Population-Level Objectives for Strategic Diversity
Garnelo, Marta, Czarnecki, Wojciech Marian, Liu, Siqi, Tirumala, Dhruva, Oh, Junhyuk, Gidel, Gauthier, van Hasselt, Hado, Balduzzi, David
Strategic diversity is often essential in games: in multi-player games, for example, evaluating a player against a diverse set of strategies will yield a more accurate estimate of its performance. Furthermore, in games with non-transitivities diversity allows a player to cover several winning strategies. However, despite the significance of strategic diversity, training agents that exhibit diverse behaviour remains a challenge. In this paper we study how to construct diverse populations of agents by carefully structuring how individuals within a population interact. Our approach is based on interaction graphs, which control the flow of information between agents during training and can encourage agents to specialise on different strategies, leading to improved overall performance. We provide evidence for the importance of diversity in multi-agent training and analyse the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games. This is an extended version of the long abstract published at AAMAS.